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ABSTRACT 

It has recently been demonstrated that one can accurately derive galaxy morphology from particular 
primary and secondary isophotal shape estimates in the Sloan Digital Sky Survey imaging catalog. 
This was accomplished by applying Machine Learning techniques to the Galaxy Zoo morphology 
catalog. Using the broad bandpass photometry of the Sloan Digital Sky Survey in combination with 
with precise knowledge of galaxy morphology should help in estimating more accurate photometric 
rcdshifts for galaxies. Using the Galaxy Zoo separation for spirals and ellipticals in combination with 
Sloan Digital Sky Survey photometry we attempt to calculate photometric redshifts. In the best 
case we find that the root mean square error for Luminous Red Galaxies classified as ellipticals is as 
low as 0.0118. Given these promising results we believe better photometric redshift estimates for all 
galaxies in the Sloan Digital Sky Survey (^350 million) will be feasible if researchers can also leverage 
their derived morphologies via Machine Learning. These initial results look to be promising for those 
interested in estimating Weak-Lensing, Baryonic Acoustic Oscillation, and other fields dependent upon 
accurate photometric redshifts. 

Subject headings: galaxies: distances and redshifts — methods: statistical 



1. INTRODUCTION 

It is commonly believed that adding information about 
the morphology of galaxies may help in the estimation 
of Photometric Redshifts (Photo-Zs) when using train- 
ing set methods. Most of this work in recent years has 
utilized The Sloan Digital Sky Survey (SDSS, York et al. 
2000). For example, as discussed in Way et al. (2009, 
hereafter Paper II) many groups have attempted to use 
a number of derived primary and secondary isophotal 
shape estimates in the Sloan Digital Sky Survey imaging 
catalog to help in estimating Photo-Zs. Some examples 
include; using the radius containing 50% and/or 90% of 
the Petrosian (1976) flux in the SDSS r band (denoted as 
petroR50_r petroR90_r in the SDSS catalog), concentra- 
tion index (CI=petroR90_r/petroR50_r), surface bright- 
ness, axial ratios and radial profile (e.g. Collister & La- 
hav 2004; Ball et al. 2004; Wadadekar 2005; Kurtz et al. 
2007; Wray & Gunn 2008). 

More recently Singal et al. (2011) have attempted to 
use Galaxy Shape parameters derived from Hubble Space 
Telescope/ Advanced Camera for Surveys imaging data 
using a principle components approach and then feeding 
this information into their Neural Network code to pre- 
dict Photo-Zs, but for samples much deeper than the 
SDSS. Unfortunately they find marginal improvement 
when using their morphology estimators. 

Another promising approach focuses on the reddening 
and inclination of galaxies. Yip et al. (2011) have at- 
tempted to quantify these effects on a galaxy's spectral 
energy distribution (SED). The idea is to use this infor- 
mation to correct the over-estimation of Photo-Zs of disk 
galaxies. 

On the other hand, attempts to morphologically clas- 
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sify large number of galaxies in the universe has gained in 
accuracy over the past 15 years as better/larger training 
samples from eye classification has increased. For exam- 
ple, Lahav et al. (1995) was one of the first to use an Ar- 
tificial Neural Network trained on 830 galaxies classified 
by the eyes of six different professional astronomers. In 
more recent years Ball et al. (2004) has attempted to clas- 
sify galaxies by morphological type using a Neural Net- 
work approach based on a sample of 1399 galaxies (from 
the catalog of Nakamura et al. (2011)). Cheng et al. 
(2011) has used a sample of 984 non-star forming SDSS 
early-type galaxies to distinguish between E, SO and Sa 
galaxies. In the past year two new attempts at morpho- 
logical classification using Machine Learning techniques 
on a Galaxy Zoo (Lintott et al. 2008, 2011) training sam- 
ple have been published (Banerji et al. 2010; Huertas- 
Company et al. 2011). The Banerji et al. (2010) results 
were impressive in that they claim to obtain classifica- 
tion to better than 90% for three different morphological 
classes (spiral, elliptical and point-sources/artifacts). 

These works are in contrast to previous work like that 
of Bernardi et al. (2003) who used a classification scheme 
based on SDSS spectra. However, this classification cer- 
tainly missed some early-type galaxies from their desired 
sample due to the presence of star formation. 

In this paper we will continue our use of Gaussian Pro- 
cess Regression to calculate Photo-Zs, using a variety of 
inputs. This method has been discussed extensively in 
two previous papers (Way & Srivastava 2006; Way et al. 
2009). 

We utilize the SDSS Main Galaxy Sample (MGS, 
Strauss et al. 2002) and the Luminous Red Galaxy Sam- 
ple (LRG, Eisenstein et al. 2001) from the SDSS Data 
Release Seven (DR7, Abazajian et al. 2009). We also 
utilize the Galaxy Zoo 1 survey results (GZ1, Lintott et 
al. 2011). The Galaxy Zoo project 3 (Lintott et al. 2008) 

3 http://www.galaxyzoo.org 
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contains a total of 900,000 SDSS galaxies with morpho- 
logical classifications (Lintott et al. 20f f). 

While this study does not focus exclusively on the LRG 
sample, it should be noted that if it is possible to im- 
prove the Photo-Z estimates for these objects as shown 
herein it could also improve the estimation of cosmologi- 
cal parameters (e.g. Blake & Bridle 2005; Padmanabhan 
et al. 2007; Percival et al. 2010; Reid et al. 2010; Zunckel, 
Gott & Lunnan 2011) using the SDSS as well as upcom- 
ing surveys such as BOSS 4 (Cuesta- Vazquez et al. 2011; 
Eisenstein et al. 2011), BigBOSS (Schlegel et al. 2009), 
and possibly Euclid (Sorba & Sawicki 2011), not to men- 
tion LSST 5 (Ivezic et al. 2008). It could also contribute to 
more reliable Photo-Z errors, as required for weak-lensing 
surveys (Bernstein & Huterer 2010; Kitching, Heavens & 
Miller 2011) and Baryonic Acoustic Oscillation measure- 
ments, which are also dependent upon accurate Photo-Z 
estimation of LRGs (Roig et al. 2008). 

2. DATA 

All of the data used herein have been obtained via the 
SDSS casjobs server 6 . In order to obtain results consis- 
tent with Paper II for both the MGS and LRG samples 
we use the same photometric quality flags (!BRIGHT 
and !BLENDED and ! SATURATED) and redshift qual- 
ity (zConf>0.95 and zWarning=0) but using the SDSS 
DR7 instead of earlier SDSS releases. These data are 
cross-matched in casjobs with columns 14-16 in Table 2 
of Lintott et al. (2011) extracting the galaxies flagged as 
'spiral', 'elliptical' or 'uncertain'. The galaxies "flagged 
as 'elliptical' or 'spiral' require 80 per cent of the vote in 
that category after the debiasing procedure has been ap- 
plied; all other galaxies are flagged 'uncertain'" (Lintott 
et al. 2011). Debiasing is the processes of correcting for 
small biases in spin direction and color. See Section 3.1 
in Lintott et al. (2011) for more details on debiasing. 

Note that the GZ1 sample is based upon the MGS, but 
the MGS contains LRGs as well. This is why we can ana- 
lyze both of these samples. However, the actual LRG sur- 
vey goes fainter than the MGS and so we do not find LRG 



Table 1. Results 



galaxies fainter than the MGS limit of r 



petrosian 



5,17.77. 



See Strauss et al. (2002) and Eisenstein et al. (2001) for 
details on the MGS and LRG samples. 
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MGS-ELL ugriz+Q+U 0.01561 0.01532 0.01620 

ugriz+P50+CI 0.01407 0.01400 0.01475 

ugriz+P50+CI+Q+U 0.01641 0.01560 0.01801 

- ugriz+B 0.01679 0.01668 0.01683 

MGS-SP ugriz+Q+U 0.01889 0.01864 0.01913 

ugriz+P50+CI 0.01938 0.01927 0.01947 

ugriz+P50+CI+Q+U 0.01751 0.01747 0.01777 

- ugriz+B 0.02092 0.02089 0.02101 

LRG-ELL ugriz+Q+U 0.01345 0.01291 0.01420 

ugriz+P50+CI 0.01334 0.01278 0.01426 

ugriz+P50+CI+Q+U 0.01584 0.01439 0.01693 

_- ugriz+B 0.01180 0.01175 0.01184 

LRG-SP ugriz+Q+U 0.01520 0.01404 0.01910 

ugriz+P50+CI 0.01514 0.01474 0.01679 

ugriz+P50+CI+Q+U 0.01957 0.01870 0.02285 
ugriz+B 0.01737 0.01728 0.01765 

a MGS=Main Galaxy Sample (Strauss et al. 2002), 
LRG=Luminous Red Galaxies (Eisenstein et al. 2001), 
SP=Classified as spiral by Galaxy Zoo, ELL=Classified as 
elliptical by Galaxy Zoo 

b u-g-r-i-z=5 SDSS dereddcncd magnitudes, P50=Petrosian 
50% light radius in SDSS i band, CI= Concentration In- 
dex (P90/P50), Q=Stokes Q value in i band, U=Stokcs U 
value in i band, B=Inputs from Table 2 of Banerji et al. 
(2010)=CI,mRrCc_i,aE_i,mCr4_i,texture_i 

c We quote the bootstrapped 50%, 10% and 90% confidence levels 
as in Paper II for the root mean square error (rmse) 



Fig. 1. — Redshift and r-band dereddened model magnitudes 
for the Main Galaxy Sample (top two panels) and Luminous Red 
Galaxies (bottom two panels). 

A number of points from both the LRG and MGS 
were eliminated because of either bad values (e.g. -9999) 
or because they were considered outliers from the main 
distribution of points. The former offenders included: 
petroR90J (13 points in the MGS sample, 1 point in 
the LRG), mEl_i (43 points, 5 points), petroR90Erri 
(7177 points, 1262 points), mRrCcErri (22 points, 12 
points). The reason for eliminating bad mEl_i points 
is that we use it for calculating aE_i from Table 2 of 
Banerji et al. (2010). A small number of outliers were 
also removed from the MGS sample, but totalled only 27 
points. No such outlier points were removed in the LRG 
sample. This leaves us with a total of 437,273 MGS and 
68,996 LRG objects. Using the GZ1 classifications in 
the MGS there are 45,249 ellipticals, 119,369 spirals and 
272,655 uncertain (~ 62%). For the LRG sample there 
are 27,227 ellipticals and 13,495 spirals leaving 28,274 
uncertain (~41%). 

3. DISCUSSION 

Using the morphological classifications from the 
Galaxy Zoo project first data release (Lintott et al. 2011) 
we attempt to calculate Photo-Zs for 4 different samples 
and four combinations of primary and secondary isopho- 
tal shape estimates from the SDSS as seen in Table 1. A 
larger variety of input combinations were tried including 
those in Table 1 of Banerji et al. (2010). However, we 
only report those found with the lowest root mean square 
error (rmse) in Table 1 of this paper. 

The results using the Banerji et al. (2010) suggested 
isophotal shape estimates as well as others tested in Pa- 
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Fig. 2. — Plots of room mean square error for a given number of 
galaxies per 50% bootstrap level with representative errors (10% 
and 90%). Main Galaxy Sample (top two panels elliptical and 
spiral) and Luminous Red Galaxies (bottom two panels elliptical 
and spiral). 



MGS elliptical: ugriz+P50+CI 



MGS spiral: ugriz+P50+CI+Q+U 




Fig. 3. — Plots of spectroscopic redshift versus predicted photo- 
metric redshift for the input with the lowest rmse for each of the 
four given data sets shown in Table 1 



per II are found in Figure 2 and Table 1. In Figure 3 we 
also show plots of the spectroscopic redshift versus the 
predicted photometric redshift for the inputs that pre- 
dict the lowest rmse for each of the four data sets listed 
in Table 1. These are more impressive than one might 
initially guess. In Paper II we showed how adding addi- 
tional bandpasses in the ultraviolet via the Galaxy Evolu- 
tion Explorer 7 (GALEX, Martin et al. 2005) could naively 
improve Photo-Z estimation. The same was shown when 
using additional bandpasses from the infrared from the 
Two Micron All Sky Survey 8 (2MASS, Skrutskie et al. 
2006). However, the results were biased because neither 
GALEX or 2MASS reach the same magnitude or red- 
shift depth as the full SDSS MGS or LRG samples. It is 
easier to get lower rmse estimates of Photo-Z when you 
have a smaller range of lower redshifts to fit. For the 
MGS it is clear from the top two panels in Figure 1 that 
the Galaxy Zoo objects span a similar range of redshifts 
and r-band magnitudes. On the other hand the situation 
for the Luminous Red Galaxies is not as straightforward. 

7 http://www.galex.caltech.edu 

8 http://www.ipac.caltech.edu/2mass 



Looking at the bottom two panels of Figure 1 the large 
second bump at a redshift of z^0.35 and r~18 does not 
exist. The latter is logical because the Galaxy Zoo cat- 
alog was drawn from the MGS and hence there are no 
galaxies beyond i pe trosian =17.77 (see Petrosian (1976) 
for details on Petrosian magnitudes) according to their 
selection criteria (Strauss et al. 2002). 

Our lowest rmse values come from galaxies categorized 
as ellipticals in the Luminous Red Galaxy Sample us- 
ing the SDSS u-g-r-i-z bandpass filters and the isophotal 
shape estimates from Table 2 of Banerji et al. (2010): ci, 
mRrCc_i, aE_i, mCr4_i, textured. These yield an rmse 
of only 0.01180, which we believe is the lowest calculated 
to date for such a large sample of galaxies measured in 
the bandpasses of the SDSS while also retaining a fairly 
large range of redshifts (0 < z < 0.25) and dereddened 
magnitudes (12 < r petrosian < 17.77). 

Taking a closer look at the kinds of inputs that im- 
prove the results by galaxy type can be interesting. It is 
clear from Table 1 that the Stokes parameters appear to 
work better for spiral than elliptical galaxies. The Stokes 
parameters measure the axis ratio and position angle of 
galaxies as projected on the sky. In detail they are flux- 
weighted second moments of a particular isophote. 



M xx = M, 



When the isophotes are self-similar ellipses one finds 
(Stoughton et al. 2002): 



Q = M xx -M y 



a + b 



cos(20), U = M : 



xy 



a + b 



sin(20), 
(2) 



The semi-major and semi-minor axes are a and b while 
</> is the position angle. Masters et al. (2010) demon- 
strates the efficacy of using SDSS derived axis ratios in 
characterizing the inclinations of spiral galaxies. This is 
seen in Table 1 where they offer the second best set of in- 
puts when determining photometric redshift for spirals. 
Both Stokes Q & U parameters also display a larger range 
of values in the spirals than in the ellipticals. The stan- 
dard deviations in Stokes Q & U for spirals are 0.1877 
& 0.1500 while for ellipticals they are 0.0596 & 0.0459. 
Hence they clearly offer more room for possible improve- 
ment in the former than in the latter. 

One of the more surprising results is the difference in 
using the B inputs for the MGS versus LRG ellipticals. 
In the latter case these inputs give the lowest RMSE 
results, while in the MGS elliptical case they give the 
worst. This could be do to the fact that the surface 
brightness of the LRG galaxies are more easily modeled 
by the B inputs than the MGS. The MGS ellipticals may 
still have clumps of star formation that can make the 
surface brightness more difficult to model than the more 
passive LRG ellipticals. 

When comparing the MGS and LRG spirals one stark 
difference is clear when utilizing the P50 (Petrosian 50% 
light radius in SDSS i band) and CI (Concentration In- 
dcx=P90/P50) inputs shown in Table 1. In the MGS spi- 
ral case these additional inputs yield worse fits, whereas 
they are among the most useful in the LRG spiral case. 
This may indicate that MGS spirals are more diverse 
morphologically than LRG spirals. The P50 and CI in- 
puts are incapable of helping to model the MGS spiral 
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diversity and simply add noise rather than signal to the 
fits. Masters et al. (2010) points out that red spirals 
(read LRG type) will "be dominated by inclined dust red- 
dened spirals, and spirals with large bulges." Note that 
this does not mean that LRG bulge dominated spirals are 
necessarily SO galaxies (which would add to their diver- 
sity both morphologically and spectroscopically). Lin- 
tott et al. (2008); Bamford et al. (2009) have both shown 
that contamination of SOs into spirals is only about 3% 
in the best case scenario. So again, perhaps P50 and CI 
can do a better job of modeling LRG spirals because they 
are less diverse than MGS spirals. 

There are several outstanding issues with using this 
approach for studies that may utilize large samples of 
SDSS LRG derived Photo-Zs (e.g. Baryonic Acoustic 
Oscillations). The first is that the GZ1 catalog has only 
been able to classify (^59%) of the LRG galaxies as spi- 
ral or elliptical. This means that 41% of our sample 
cannot benefit from morphology knowledge when esti- 
mating Photo-Zs. Secondly, the LRGs used herein do 
not go to the same depth (in redshift or magnitude) as 
the full LRG (r<19) catalog since the GZ1 is based on 
the MGS (r<17.77). Note also that the GZ1 morphol- 
ogy estimates get worse as one reaches the fainter end of 
the sample (Lintott et al. 2008). Thirdly, the Machine 
Learning derived morphologies of Banerji et al. (2010) 
can only classify up to 90% as accurately as their 'by 
eye' GZ1 training set. These constraints will have to be 
taken into account for any studies that attempt to utilize 
morphology in Photo-Z calculations. 

The Photo-Z code used to generate the results from 
this paper are available on the NASA Ames Dashlink web 
site https://dashlink.arc.nasa.gov/algorithm/stablegp 
and is described in Foster et al. (2009). 
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